Semantic Website Clustering

نویسندگان

  • I-Hsuan Yang
  • Yu-tsun Huang
  • Yen-Ling Huang
چکیده

We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic feature vectors from user click-through data. We then use LSA (Latent Semantic Analysis) to reduce the feature dimension and K-means algorithm to cluster documents. Compared to the traditional way of feature extraction (lexical binomial model), our new model has better purity (75%) and F-measure (52%). We can further use features combined from both methods and reach purity 82% and F-measure 52%. Moreover, the same method can be used to cluster queries, and with the result purity 74% and F-measure 43%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word clustering effect on vocabulary learning of EFL learners: A case of semantic versus phonological clustering

The aim of this study is to determine the effect of word clustering method on vocabulary learning of Iranian EFL learners through a case of semantic versus phonological clustering. To this effect, 80 homogeneous students from four intermediate classes at an English institute in Torbat e Heydariyeh participated in this research. They were assigned to four groups according to semantic versus phon...

متن کامل

The Impact of Semantic Clustering on Iranian EFL Advanced Learners’ Vocabulary Retention

This study investigated the impact of semantic clustering on Iranian EFL learners’ vocabulary retention at advanced level. Participants were female learners randomly assigned to two groups of 15. Four instruments (TOEFL test; vocabulary pretest; immediate posttest, and delayed recall posttest) were used. The experimental group underwent semantic clustering vocabulary presentation in which the l...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Efficient Clustering of Web Search Results Using Enhanced Lingo Algorithm

Web query optimization is the focus of recent research and development efforts. To fetch the required information, the users are using search engines and sometimes through the website interfaces. One approach is search engine optimization which is used by the website developers to popularize their website through the search engine results. Clustering is a main task of explorative data mining pr...

متن کامل

Centralized Clustering Method To Increase Accuracy In Ontology Matching Systems

Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007